Goto

Collaborating Authors

 ml application


Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection

Neural Information Processing Systems

Unpredictable ML model behavior on unseen data, especially in the health domain, raises serious concerns about its safety as repercussions for mistakes can be fatal. In this paper, we explore the feasibility of using state-of-the-art out-of-distribution detectors for reliable and trustworthy diagnostic predictions. We select publicly available deep learning models relating to various health conditions (e.g., skin cancer, lung sound, and Parkinson's disease) using various input data types (e.g., image, audio, and motion data). We demonstrate that these models show unreasonable predictions on out-of-distribution datasets. We show that Mahalanobis distance-and Gram matrices-based out-of-distribution detection methods are able to detect out-of-distribution data with high accuracy for the health models that operate on different modalities. We then translate the out-of-distribution score into a human interpretable \textsc{confidence score} to investigate its effect on the users' interaction with health ML applications. Our user study shows that the \textsc{confidence score} helped the participants only trust the results with a high score to make a medical decision and disregard results with a low score. Through this work, we demonstrate that dataset shift is a critical piece of information for high-stake ML applications, such as medical diagnosis and healthcare, to provide reliable and trustworthy predictions to the users.


Automated File-Level Logging Generation for Machine Learning Applications using LLMs: A Case Study using GPT-4o Mini

Rodriguez, Mayra Sofia Ruiz, Khatoonabadi, SayedHassan, Shihab, Emad

arXiv.org Artificial Intelligence

Logging is essential in software development, helping developers monitor system behavior and aiding in debugging applications. Given the ability of large language models (LLMs) to generate natural language and code, researchers are exploring their potential to generate log statements. However, prior work focuses on evaluating logs introduced in code functions, leaving file-level log generation underexplored -- especially in machine learning (ML) applications, where comprehensive logging can enhance reliability. In this study, we evaluate the capacity of GPT-4o mini as a case study to generate log statements for ML projects at file level. We gathered a set of 171 ML repositories containing 4,073 Python files with at least one log statement. We identified and removed the original logs from the files, prompted the LLM to generate logs for them, and evaluated both the position of the logs and log level, variables, and text quality of the generated logs compared to human-written logs. In addition, we manually analyzed a representative sample of generated logs to identify common patterns and challenges. We find that the LLM introduces logs in the same place as humans in 63.91% of cases, but at the cost of a high overlogging rate of 82.66%. Furthermore, our manual analysis reveals challenges for file-level logging, which shows overlogging at the beginning or end of a function, difficulty logging within large code blocks, and misalignment with project-specific logging conventions. While the LLM shows promise for generating logs for complete files, these limitations remain to be addressed for practical implementation.


Automatic Identification of Machine Learning-Specific Code Smells

Hamfelt, Peter, Britto, Ricardo, Rocha, Lincoln, Almendra, Camilo

arXiv.org Artificial Intelligence

Machine learning (ML) has rapidly grown in popularity, becoming vital to many industries. Currently, the research on code smells in ML applications lacks tools and studies that address the identification and validity of ML-specific code smells. This work investigates suitable methods and tools to design and develop a static code analysis tool (MLpylint) based on code smell criteria. This research employed the Design Science Methodology. In the problem identification phase, a literature review was conducted to identify ML-specific code smells. In solution design, a secondary literature review and consultations with experts were performed to select methods and tools for implementing the tool. We evaluated the tool on data from 160 open-source ML applications sourced from GitHub. We also conducted a static validation through an expert survey involving 15 ML professionals. The results indicate the effectiveness and usefulness of the MLpylint. We aim to extend our current approach by investigating ways to introduce MLpylint seamlessly into development workflows, fostering a more productive and innovative developer environment.


Sustainable Machine Learning Retraining: Optimizing Energy Efficiency Without Compromising Accuracy

Poenaru-Olaru, Lorena, Sallou, June, Cruz, Luis, Rellermeyer, Jan, van Deursen, Arie

arXiv.org Artificial Intelligence

--The reliability of machine learning (ML) software systems is heavily influenced by changes in data over time. For that reason, ML systems require regular maintenance, typically based on model retraining. However, retraining requires significant computational demand, which makes it energy-intensive and raises concerns about its environmental impact. T o understand which retraining techniques should be considered when designing sustainable ML applications, in this work, we study the energy consumption of common retraining techniques. Since the accuracy of ML systems is also essential, we compare retraining techniques in terms of both energy efficiency and accuracy. We showcase that retraining with only the most recent data, compared to all available data, reduces energy consumption by up to 25%, being a sustainable alternative to the status quo. Furthermore, our findings show that retraining a model only when there is evidence that updates are necessary, rather than on a fixed schedule, can reduce energy consumption by up to 40%, provided a reliable data change detector is in place. Our findings pave the way for better recommendations for ML practitioners, guiding them toward more energy-efficient retraining techniques when designing sustainable ML software systems. The increasing adoption of Machine Learning (ML) and Artificial Intelligence (AI) within organizations has resulted in the development of more ML/AI software systems [1]. Although ML/AI brings plenty of business value, it is known that the accuracy of ML applications decreases over time [2]. Thus, ML developers must monitor and maintain their ML systems in production. One reason for this phenomenon is the fact that ML applications are highly dependent on the data on which they have been trained. Real-world data usually changes over time [3] - a phenomenon often referred to as concept drift [4] - which can significantly impact the normal operation of ML systems [5]. Therefore, appropriate maintenance techniques are required for the design of ML software systems. One common approach to maintaining these systems is to periodically update these applications by retraining the underlying ML models with the latest version of the data [6], [7]. On another note, the process of training machine learning models has raised substantial concerns about the carbon footprint of ML applications [8], [9].


A note on the physical interpretation of neural PDE's

Succi, Sauro

arXiv.org Artificial Intelligence

Machine Learning (ML) has taken science (and society) by storm in the last decade, with numerous applications which seem to defy our best theoretical and modeling tools [11]. Leaving aside a significant amount of hype, ML raises a number of genuine hopes to counter some the most vexing challenges for the scientific method, particularly the curse of dimensionality [2]. This however does not come for free; in particular the current trends towards the use of an astronomical number of parameters (trillions in the case of recent chatbots), none of which lends itself to a direct physical interpretation, jointly with an unsustainable power demand, beg for a change of strategy, namely less weights and more insight [17]. In this paper, we present an attempt along this line. In particular, by highlighting the one-to-one mapping between ML procedures and discrete dynamical systems, we suggest that ML could possibly be conducted by means of a restricted and more economical class of weight matrices, each of which can be interpreted as a specific information-propagation process.


On Model Parallelization and Scheduling Strategies for Distributed Machine Learning

Seunghak Lee, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A. Gibson, Eric P. Xing

Neural Information Processing Systems

Distributed machine learning has typically been approached from a data parallel perspective, where big data are partitioned to multiple workers and an algorithm is executed concurrently over different data subsets under various synchronization schemes to ensure speed-up and/or correctness. A sibling problem that has received relatively less attention is how to ensure efficient and correct model parallel execution of ML algorithms, where parameters of an ML program are partitioned to different workers and undergone concurrent iterative updates. We argue that model and data parallelisms impose rather different challenges for system design, algorithmic adjustment, and theoretical analysis. In this paper, we develop a system for model-parallelism, STRADS, that provides a programming abstraction for scheduling parameter updates by discovering and leveraging changing structural properties of ML programs. STRADS enables a flexible tradeoff between scheduling efficiency and fidelity to intrinsic dependencies within the models, and improves memory efficiency of distributed ML. We demonstrate the efficacy of model-parallel algorithms implemented on STRADS versus popular implementations for topic modeling, matrix factorization, and Lasso.


Reviews: Faster width-dependent algorithm for mixed packing and covering LPs

Neural Information Processing Systems

This paper shows that Mixed Packing Covering Linear programs can be eps-approximated in O(1/eps) iterations. This work builds on Sherman's approach to max flow. This is a strong result on an important class of problems. Optimization problems of this nature arise frequently in machine learning applications and reducing the eps dependence from quadratic to linear is a significant improvement. The reviewers liked the result and the presentation and strongly supported acceptance.


Perceived Fairness of the Machine Learning Development Process: Concept Scale Development

Mishra, Anoop, Khazanchi, Deepak

arXiv.org Artificial Intelligence

In machine learning (ML) applications, unfairness is triggered due to bias in the data, the data curation process, erroneous assumptions, and implicit bias rendered during the development process. It is also well-accepted by researchers that fairness in ML application development is highly subjective, with a lack of clarity of what it means from an ML development and implementation perspective. Thus, in this research, we investigate and formalize the notion of the perceived fairness of ML development from a sociotechnical lens. Our goal in this research is to understand the characteristics of perceived fairness in ML applications. We address this research goal using a three-pronged strategy: 1) conducting virtual focus groups with ML developers, 2) reviewing existing literature on fairness in ML, and 3) incorporating aspects of justice theory relating to procedural and distributive justice. Based on our theoretical exposition, we propose operational attributes of perceived fairness to be transparency, accountability, and representativeness. These are described in terms of multiple concepts that comprise each dimension of perceived fairness. We use this operationalization to empirically validate the notion of perceived fairness of machine learning (ML) applications from both the ML practioners and users perspectives. The multidimensional framework for perceived fairness offers a comprehensive understanding of perceived fairness, which can guide the creation of fair ML systems with positive implications for society and businesses.


TAACKIT: Track Annotation and Analytics with Continuous Knowledge Integration Tool

Lee, Lily, Fontes, Julian, Weinert, Andrew, Schomacker, Laura, Stabile, Daniel, Hou, Jonathan

arXiv.org Artificial Intelligence

Machine learning (ML) is a powerful tool for efficiently analyzing data, detecting patterns, and forecasting trends across various domains such as text, audio, and images. The availability of annotation tools to generate reliably annotated data is crucial for advances in ML applications. In the domain of geospatial tracks, the lack of such tools to annotate and validate data impedes rapid and accessible ML application development. This paper presents Track Annotation and Analytics with Continuous Knowledge Integration Tool (TAACKIT) to serve the critically important functions of annotating geospatial track data and validating ML models. We demonstrate an ML application use case in the air traffic domain to illustrate its data annotation and model evaluation power and quantify the annotation effort reduction.


Exploring the Use of Machine Learning Weather Models in Data Assimilation

Tian, Xiaoxu, Holdaway, Daniel, Kleist, Daryl

arXiv.org Artificial Intelligence

The use of machine learning (ML) models in meteorology has attracted significant attention for their potential to improve weather forecasting efficiency and accuracy. GraphCast and NeuralGCM, two promising ML-based weather models, are at the forefront of this innovation. However, their suitability for data assimilation (DA) systems, particularly for four-dimensional variational (4DVar) DA, remains under-explored. This study evaluates the tangent linear (TL) and adjoint (AD) models of both GraphCast and NeuralGCM to assess their viability for integration into a DA framework. We compare the TL/AD results of GraphCast and NeuralGCM with those of the Model for Prediction Across Scales - Atmosphere (MPAS-A), a well-established numerical weather prediction (NWP) model. The comparison focuses on the physical consistency and reliability of TL/AD responses to perturbations. While the adjoint results of both GraphCast and NeuralGCM show some similarity to those of MPAS-A, they also exhibit unphysical noise at various vertical levels, raising concerns about their robustness for operational DA systems. The implications of this study extend beyond 4DVar applications. Unphysical behavior and noise in ML-derived TL/AD models could lead to inaccurate error covariances and unreliable ensemble forecasts, potentially degrading the overall performance of ensemble-based DA systems, as well. Addressing these challenges is critical to ensuring that ML models, such as GraphCast and NeuralGCM, can be effectively integrated into operational DA systems, paving the way for more accurate and efficient weather predictions.